18 research outputs found

    NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks

    Get PDF
    The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is in certain applications densely populated. This makes computations, in particular matrix-vector products, with the graph Laplacian a hard task. A typical application is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose the use of the fast summation based on the nonequispaced fast Fourier transform (NFFT) to perform the dense matrix-vector product with the graph Laplacian fast without ever forming the whole matrix. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated multiplication into Lanczos-based eigenvalues routines or iterative linear system solvers and even consider other than the standard Gaussian kernels. We illustrate the feasibility of our approach on a number of test problems from image segmentation to semi-supervised learning based on graph-based PDEs. In particular, we compare our approach with the Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure

    Run-Time Reconfiguration for HyperTransport coupled FPGAs using ACCFS

    Get PDF
    In this paper we present a solution where only one FPGA is needed in a host coupled system, in which the FPGA can be reconfigured by a user application during run-time without loosing the host link connection. A hardware infrastructure on the FPGA and the software framework ACCFS (ACCelerator File System) on the host system is provided to the user which allow easy handling of reconfiguration and communication between the host and the FPGA. Such a system can be used for offloading compute kernels on the FPGA in high performance computing or exchanging functionality in highly available systems during run-time without loosing the host link during reconfiguration. The implementation was done for a HyperTransport coupled FPGA. The design of a HyperTransport cave was extended in such a way that it provides an infrastructure for run-time reconfigurable (RTR) modules

    OpenMP parallelization in the NFFT software library

    Get PDF
    software library and present the used parallelization approaches. Besides the NFFT kernel, the NFFT on the two-sphere and the fast summation based on NFFT are also parallelized. Thereby, the parallelization is based on OpenMP and the multi-threaded FFTW library. Furthermore, benchmarks for various cases are performed. The results show that an efficiency higher than 0.50 and up to 0.79 can still be achieved at 12 threads. 1 Overview The NFFT3 library [3] and its MATLAB interface were parallelized using OpenMP [7]. Both the non-parallel version and the multi-thread OpenMP version of the NFFT3 library provide an identical Application Programming Interface (API). This is realized by using distinct library files for both versions. The non-parallel version of the NFFT3 library can be found in libnfft3.so and libnfft3.a, the multi-thread OpenMP version in libnfft3_threads.so and libnfft3_threads.a. For the MATLAB interface, the user has to specifiy at compile time whether the non-parallel or multi-thread OpenMP version should be built. The following kernels of the NFFT3 library were parallelized using OpenMP: • kernel/nfft: – NDFT (nonequidistant discrete Fourier transform
    corecore